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Abstract. A common problem, arising in many different applied contexts, consists in 
estimating the number of exponentially damped sinusoids whose weighted sum best fits 
a finite set of noisy data and in estimating their parameters. Many different methods 
exist to this purpose. The best of them are based on approximate Maximum Likelihood 
estimators, assuming to know the number of damped sinusoids, which is then estimated 
by an order selection procedure. It turns out that Maximum Likelihood estimators 
are biased in this specific case. The idea pursued here is to cope with the bias, by 
a stochastic perturbation method, in order to get an estimator with smaller Mean 
Squared Error than the Maximum Likelihood one. Moreover the problem of estimating 
the number of damped sinusoids and the problem of estimating their parameters are 
solved jointly. The method is automatic, provided that a few hyperparameters have 
been chosen, and faster than standard best alternatives. 
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Let's consider the model 

<i 

fR{t; q, Pr) = AjP] cos{27ru; jt + Oj), t e R+, uu ^ ouk V/i, k, (1) 

i=i 

Pr = {A„ p„uj„ 9„ j = l...,q}e R^" (2) 
and assume that we want to estimate q, Pr from the data 

ttk = fR{kA) + ek, k^O,...,n-l,n>Aq,AeR+ 

where A > is known, are i.i.d. zero-mean Gaussian variables with known variance 
(T^. In order to make the model /r identifiable from {a^} we assume that |a;j|A < tt, Vj. 
In fact if e.g. cUrA > tt it exists a) e [— tt, tt] such that a;^ A = ujA + 2'!rh, h e IV, h ^ 
and fR{t; q, Pr) = /^(t; g, P^) where = Pr \ {ur} [j{oj}. We notice that q, Pr) 
is a particular case of the complex model 

f{t;p,P) = j2c,ej, teR+, 

when p = 2g, g e IV, $>m(/) = and 

c, = lv^^^ = P,e^''^"^ J = 1, 

Therefore in the following we consider the problem of estimating P from the complex 
data (cfe, k — 0, . . . ,n — 1) with the identifiability condition \arg{^j)\A < tt where 
the noise are i.i.d. zero-mean complex Gaussian variables with known variance i.e. 
the real and imaginary parts of are independently distributed as Gaussian variables 
with variance and mean ^e[f{kA)], Q^m[/(fcA)] respectively. 

The problem described above arises in many fields. A certainly not exhaustive 
list is the following: noisy Hausdorff moment problem, numerical inversion of Laplace 
transform, noisy trigonometric moment problem, identification of constant coefficients 
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ODE from its transient response, approximation by complex exponentials functions, 
modal analysis, direction of arrival problem, shape from moments problem O [71 El [T3| 
[El [m ESI [27]. 

It is well known that the problem can be severely ill posed, depending on the 
relative location in the complex plane of the points = 1, . . . ,p and on the ratios 
SNRj = \cj\/a,j = 1, . . . ,p. A further difficulty is related to the fact that p is unknown. 
This means that when the ratios SNRj,j = 1, . . . ,p are bounded by some constant 
C < oo even if you are able to guess the right order p of the model, different realization 
of the process can give rise to quite different estimates of the other parameters in P. 
The difficulty of guessing the right order is related to the difficulty of estimating the other 
parameters. In fact if these were correctly estimated a good guess of p would minimize 
an order selection criterium such as AIC or BIG [2J. Unfortunately you cannot hope 
to get good estimates of the other parameters if p is not correctly estimated. Because 
of this situation many methods have been proposed to solve the problem by filtering 
the noise in different ways and/or considering different estimators. Those which provide 
the best performances, assuming to know the right order p, compute an approximation 
of the Maximum Likelihood estimator of the parameters filtering somewhat the noise 
at the same time [HI [HI [20]. The guess of the order is then used to build the noise 
filter and therefore to improve the estimates of the other parameters. Different guesses 
can then be tested in order to minimize e.g. BIG. An automatic procedure can then be 
devised. However it turns out that MLEs are biased. Therefore approximating them is 
not necessarily such a good idea especially in low SNRs cases. Moreover the methods 
which perform better need to solve one or more eigenvalues problems which can be very 
computationally expensive for large data sets. 

In this paper a black-box method which encompasses all these difficulties is proposed 
and experimentally compared with one of the best known standard method (GPOF [T8] ) 
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coupled with an order selection criterium (BIG). It exploits some recent results given 
in [3] where the bias of MLEs is used to improve the MSE by a stochastic perturbation 
approach as well as a method to estimate the distribution in the complex plane of the 

= l,...,p which are the most critical parameters. The problem is stated in a 
stochastic framework where the estimation of all the parameters in P can be addressed 
jointly. Moreover the method is fast and can be easily specialized to take into account 
prior information on the problem at hand. 

The paper is organized as follows. In section 1 the Maximum Likelihood and related 
estimators and their properties in this context are shortly reviewed. In section 2 the 
proposed method is described and some parameters required to make it automatic are 
discussed and estimated. In section 3 a simulation experiment is presented to compare 
the results provided by a standard method and the proposed one. 

1. Maximum Likelihood and related estimates 
1.1. Algebraic and statistical properties of MLE 

Maximum likelihood estimates Pml of the parameters P of the model f{t;p,P), 
assuming that p is known, are obtained by 

lla-/(t;p,P)||| 

Pml = argmaxp e ^ = argmiup \\a — f{t;p, P)\\2 

where a = [oq, . . . , On-i], t = [0, A, . . . , (n — 1)A]. In order to solve this nonlinear least 
squares problem, following [14J, we notice that the problem is separable. In fact we can 
split the parameters P in two sets P = Pdj P^^ where f(t; p, 7, () = Z]j=i IjCj- For sach 
fixed value C G let us consider the function 7(C) defined by 

7(0 = argmin^lla - /(t; J9, 7, C) II2 = argmin^(a - Vjj^ia - Vj) 
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where V = V{() is the Vandermonde matrix of order n x p of the vector (, H denotes 
transposition plus conjugation and J„ is the identity matrix of order n. It is proved in 
[14] that, substituting 7(^) in \\a — /(t; 7, C)ll2 minimizing w.r.to C, we get 

iuL = argmin^lla - /(t;p,7(C),C)ll2 = 
aTgmm^{a-V{V^V)-^V^a)^{a-V{V^V)-^V"a) = 

argmin^a^(J„ - V{V^Vy^V^)a 

and 

In order to study the properties of the ML estimator we start by noticing that 

Proposition 1 It does not exist an efficient estimator of the parameters P. Specifically 
the MLE of P is not efficient. 

Proof. We notice that the log-likelihood function is an absolute continuous function of 
P. Hence, by Corollary 3.1 and Theorem 3.1 of [21] if the variance of an estimator of P 
would attain the Cramer-Rao bound this would imply that the probability density 

1 ll<i-/(t;p,P)lli 

e 

of a would belong to the exponential family. But this is false because of the dependence 
of ah on S,^. □ 

1.2. Approximate MLE: complex exponentials interpolation 

We then consider the problem of interpolating the data a by means of a linear 
combination of complex exponential functions Cj, C,j j = 1, . . . , n/2, that is to find 
n complex numbers [7, Q = {'jj, = 1, • • • , n/2 such that a = V{()^. Equivalently 
we could consider the problem of building the Fade' approximation [n/2, n/2 — 1] to the 
Z— transform of ak, k = 0,1, . . . [Ml [7]. To this aim let us consider the Hankel matrix 
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pencil Ui — zUq, z & where 



C/o(a) = ?7(ao, . • . ,an-2), Ui{a) = U{ai, . . . ,a„-i) 



and 



•^n—l I 



Xi X2 ... Xn/2 

^2 ... Xn/2+1 



Xn/2+1 



Xn — 1 



It is well known (e.g. [IS]) that, provided that detUo 7^ 0,detUi 7^ 0, a unique solution 
exists which is given by C = ^q^^ where are the generahzed eigenvalues of the 
pencil Ui — zUq and 7 = Wq^q where Wqe is the matrix of generalized eigenvectors of 
Ui — zUq and T denotes transposition. Moreover it turns out that Wge = ^i^GE^~^ 
where V{C,^^) is the square Vandermonde matrix based on These properties can 
be easily checked by noticing that if a = V{()^ then 



Uo = vCocvCcf, u, = vCoczvCcf 



where 



C = diag{'ji, 'jn/2} and Z = diag{Ci, Cn/2} 

and therefore UiV{Q^'^ = UoV{Cy^Z which imphes that ( are the generalized 
eigenvalues of the pencil Ui — zUq. The relation between [0^/^,^^^^^] and [7,^] is given 
by 

Proposition 2 If n = 2p then [QML^iMJ = &3- 

Proof. Let he V = VCO- Substituting a = 1/7 in a^(/„ - V{V"V)-^V^)a we get 

f'V'iin - v{v''vy^v'')V2 = 0. 

But a"{In - V{V"V)-^V")a > 0, hence ||a - 7(C), Oil! takes its least possible 

value when V = V{() therefore C = C,^^^^ and 7 = (y^V)^^V^a = c^l- ^ 
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1.3. Bias ofMLE 

As an easy consequence of the Proposition above we show that the MLE can not have 
moments. In particular MLE can not have the mean, therefore bias can not be defined. 
Let us consider the case when n = 2,p = l,6i = uji = 0,|p| < 1. Therefore 

ao = A + eo, ai = Ap + ei, f/o = Oq, t^i = ai, Pml = —- 

The density of pml is then the density of the ratio of two independent Normal variables 
with means A and Ap respectively and variance a^. Assuming for simplicity that A = 1, 
it can be shown (e.g.|l_lj and references therein) that this density is given by 



pJx) = —— e ^ + \ ^ ^ Erf 



1 + 



_a^2(x2 + l)_ 

which is a Cauchy-like density and therefore moments do not exist. However in the 
general case we can consider the formal power series of ^^^j^, truncate it e.g. after the 
first term and compute the expectation, denoted, with abuse of notation, by E\^^^^. If 
we define as bias the quantity — ^|| we have 

Proposition 3 When n = 2p the MLE [cmli^j^j} (^f^ biased. 

Proof. If n = 2p by Proposition [2] [cMiiij^ij} = [iX] which solve the complex 
exponentials interpolation problem. However in [3l lemma2] it was proved that 
E [7] -c = o((t) and E[C] = o((t). □ 

Therefore when n = 2p the MLE are only asymptotically unbiased in the limit for 
cr ^ in a weak sense i.e. when expectations are computed w.r.t an approximation of 
the true density of the ML estimator. One could argue that letting n (yo for p fixed 
could improve the situation when cr > 0. This is not the case as we now show for the 
simplest case of the model at = p*-*^^^ + cj, t = 0, . . . ,n — 1 where |p| < 1 and are 
i.i.d. Gaussian zero- mean random variables with variance o"^. We have 
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Proposition 4 The density of the MLE of the parameter p can be approximated by a 
density pn{x) such that 



lim pn{x) = Poo{x), X e (-1, 1) 
and Poo (x) is a density such that 

\impooix;p,a) = 5{x - p) 

(in the sense of distributions). Moreover, for a > 0, Poo{x) is bimodal, the modes pi^2 
have opposite signs and liiiio-^oo Aii,2 = ±1- 

Proof. Because pml annihilates the gradient of the likelihood which is Gaussian, 
following pp , we can approximate the density of pm l for any n by considering a first order 
Taylor series approximation of pml around the point in which we want to approximate 
its density. We then get 



n—>oo 



lim Pn{x) = if \x\ > 1 




and 



n 




which exists because 




0, < a < C < oo. 



It then follows that 



n- 



lim Pn{x) = if |x| > 1 



and 
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where 

S{x)^—, TT^TT — — T^, R{x;p)^ 



^ '^^ (1 - X){1 + X){px - + X^) 

and 



exists and it is finite because 



lim {x - Ij^e-^^^^'^^y^ =0, 0<a<C<oo 



lim {x + l)h~^^^'''''^Js{x) = 0, 0<a<C<oo. 

x—*—l+ ' 

As R{x; p) has no real zeros in (—1, 1), we have 

Vx ^ p, X e (-1, 1) 
oo x — p 
therefore 



\imp^{x\p,a) = 5{x - p) 

<T — >U 

in the weak sense. To complete the proof let us consider the function 

/(x; p, cr) = - log 5* (x) - log (7 p) = cost. + log[poo {x) 



Its first derivative is 



o-2Qi(a;;p) Q2{x; p) 

where Pi{x; p),Qi{x; p), P2{x; p),Q2{x; p) are polynomials of orders 15,17,16,17 
respectively. When p — 0, a — 1 we get 

(x- 1)2(0; + 1)^(0;^ + 1)2 ' ^i'^2,r3>0 
which has three real roots {0, ±ri} e (—1, 1) corresponding to two local maxima and 
one minimum of /(x; 0, 1). By continuity it is easy to prove that the same holds for 
Poo{x; p,a). Moreover 

1- w/ X P2{x;p) 
lim / (x; p, a) = — — 
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and it turns out that 



P2{x; p) = 2x{x - + + + 2){xp - if 



therefore the real roots in (—1, 1) are {0, ±1} because 1/p > 1. □ 
We conclude that in the limit of the considered approximation the MSE is a decreasing 
function of n - because the support of Pnix) stretches from iR to (—1,1) as n increases 
- with an asymptotic value 



when cr > 0. Moreover the MLE are biased for all a > and MLE is a decreasing 
function of a because of the continuous dependence on a of Poo{x) and the presence 
of an interval of positive probability around a secondary mode far away from p, which 
disappears only in the limit a — > 0. In figures [T|3] simulation results are reported 
confirming this claims and hence the goodness of the approximation. Moreover the 
limit behavior for a oo - i.e. when the data are Gaussian white noise - of Poo{x) is 
the same, projected on the real line, as the condensed density [15\ of the Q obtained 
by solving the complex exponentials interpolation problem of a Gaussian white noise 
In figure m this behavior is illustrated by simulation. The connection between ML 
estimation and complex exponentials interpolation in this case justify the conjecture 
that the same relationship holds in the general case. As we know by [SI lemma2] that 
jj,Cj &re biased we can conjecture that also the MLE in the general case are biased. 
Moreover we can conjecture that for a moving from to oo the condensed density of the 
MLE of moves from a sum of Dirac's deltas centered on C,j to a density concentrated 
around the unit circle as happens for the (j. 

1.4- Standard pencil methods: GPOF 

Computation of MLE is usually complicated because the right model order p should be 
known and many local maxima are present when SNR is low or moderately large. In 
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literature many algorithms to get approximate MLE are present and their relative merits 
are usually measured in terms of the CR bound for the asymptotic unbiased estimators 
[9l[22]. This does not make much sense because we are interested in solving the problem 
when cr > but can help to compare algorithms. As expected because of the asymptotic 
unbiasdness, when the noise variance is less than a threshold, all algorithms produce 
reasonable estimates (see [13] for a comparison). Moreover some heuristic algorithms can 
exceed the CR bound (because of the bias) and hence it is suggested that the bias can 
help to decrease the noise threshold below which meaningful estimates can eventually 
be computed [22] . 

Moreover, because of the connection between ML estimation and complex 
exponential interpolation, many approximate ML algorithms are based on complex 
exponential interpolation of the data. The main advantages over the exact MLE 
algorithms are that no initialization must be provided and the computation is faster. 
The best of them include some sort of noise filtering in order to increase the SNR ratio. 
Cadzow method [ID] and GPOF [18] are examples of this approach. We give here a 
short summary of GPOF method because it is used in the proposed estimation procedure 
described in Section 2 and it will be used for comparisons in Section 3. Assuming that the 
data a are noisy and that we know the true number p of complex exponentials, the aim 
of GPOF is to estimate the non linear parameters ^j, j = 1, . . . ,p by solving a filtered 
generalized eigenvalue problem. When the data are noiseless we know that we can 
retrieve ^ by solving the complex exponential interpolation problem based on a square 
pencil of order pxp i.e. n = 2p data are enough. If we use n > 2p data and use a square 
pencil of order n/2 x n/2 the conditions detUo ^ 0, detUi 7^ to solve the problem and 
to get a unique solution are no longer satisfied because rank{Uo) = rank{Ui) = p < n/2. 
When noise is present it make sense to assume that n/2—p terms of the model represents 
the noise. Therefore we can solve the complex exponential interpolation problem of order 
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n/2 and then discard the n/2 — p terms associated e.g. with the lowest absolute values 
\cj\ of the weights. As an alternative we can first filter-out the noise from the pencil 
and then solve a complex exponential interpolation problem of order p. More generally 
we can assume that the model is made up of I terms, I — p oi them representing the 
noise, with p < I < n — p, i.e. a = V(C)7 where V{Q e (^("-Ox^ jg ^j^e Vandermonde 
matrix based on j — 1, . . . ,1. We notice that the larger I the smaller the number 
of equations n — I that we can form with n observations. By choosing / we can control 
how accurately to represent the noise and hence the signal, but the price to pay is on 
the number of constraints that can be considered. We can then consider a rectangular 
pencil Ui — zUq with 

where Vi(C) ^ (f'^""'-'^', t^2(C) ^ the Vandermonde matrices based on — 

1, . . . ,1 and 

C = diag{% . . . , 7,} and Z = diag{Ci, . . . , 0} 

and therefore t/iV^2(C)* = t/oV^2(C)*^ where = {X^ = {X^Y and denotes the 
generahzed inverse of X. Therefore ( are the generahzed eigenvalues of the rectangular 
pencil Ui — zUq. We want now to compute the signal related generalized eigenvalues by 
solving an eigenvalue problem of order p. To this aim let us define the data matrix 

Qq ai ... ai 

Oi 02 ... 



O-n-l-l O-n-l ■ ■ ■ dn-l 

from which we can retrieve Uq,Ui by 



p<l<n/2, 



(3) 



Uo = UEo, Ui = UEi, Eq = [ei, . . . ,6^], Ei = [63, . . . ,e;+i] 

where is the j— th column of the identity matrix J^+i. Let us consider then its singular 
value decomposition U = PDQ, P g (T^""') D ^(]]{n-i)x{i+i) ^ q ^^(i+i)x(i+i)_ 
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the noiseless case rank{U) = p therefore the last n — I ~ p elements on the diagonal of 
D are zero and U = P^D^Q^ where G GJ'p^p is obtained from D by dropping the 
last n — I — p rows or columns, E (^(""'^^p is obtained from P by dropping the last 
n — I — p columns and G is obtained from Q by dropping the last n — I — p 

rows. In the noisy case we can filter out the smallest n — l — p elements on the diagonal 
of D setting them to zero. But then the Hankel structure of = P^D^Q^ is lost. 
Cadzow [Idj suggests to retrieve this structure while filtering out the smallest singular 
values by the iteration: 

• [/(o) = U 

• for = 0, 1, . . . 

, f/(fc) ^ p(k) D(k)Q{k) 

• f/(^+i) = Hankel([/^) 

• if - t/C^) II < T] then stop 

• end 

where > is a small tolerance and the operator Hankel (A) maps the matrix A into 
the matrix obtained by substituting each element of a secondary diagonal of A by 
the average of the elements of that diagonal. In [lOj is proved that this iteration is a 
specific instance of a general method which converges under hypotheses that are verified 
in the case considered here. Denoting by P^D^Q^ the singular value decomposition 
of the Hankel matrix produced by the iteration we are therefore reduced to solve the 
rectangular [n — I) x / generalized eigenvalue problem 

P^D^Q^EiW = P^D^Q^EqWZ. 

We notice that P = P^D^ G (r("-Oxp h^s maximum rank p therefore its generalized 
inverse is = {P^ P)^^P^ . Therefore by left- multiplying by P^^ the problem above 
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reduces to the rectangular p x / generalized eigenvalue problem 

Q^EiW = Q^EoWZ 

and we are looking for the p non-zero rank reducing numbers ([, ■ ■ ■ ,Cp which solve 
this problem. We have 

Proposition 5 Let be Qo = (Q^Eo)^ E (T'^f, Qi = (Q^Ei)^ e (T'^p. Then 
the eigenvalues of QoQi G (T^^^ are rank reducing numbers of the rectangular pencil 
Q^E^ - zQ^Eq. 

Proof. Qo has maximum rank p, therefore its generalized inverse is Qq = {QqQo)^^Qo 
and QoQi is a square matrix of maximum rank p with non zero eigenvalues which are 
also the eigenvalues of (QoQi)^ = Qi (Qo)^ ■ But then there exist left eigenvectors 
such that 

yfiQ^E, - CfQ^'Eo) = 0, j = 1, . . . ,p, □. 

We notice that the singular value decomposition of U can be replaced by its PRQ rank 
revealing decomposition [T2] where P and Q are unitary matrices and i? is a trapezoidal 
matrix such that the absolute values on the diagonal are in decreasing order. In fact 
it turns out that in the noiseless case R is a trapezoidal matrix of rank p fTll Section 
7.3] and noise filtering can be performed by setting to zero the last n — I — p rows of R. 
Despite the obvious computational advantages this method is worse than the one based 
on svd for low SNRs because the best approximation property of svd does not hold. 

It turns out that the method is sensitive not only to the good choice of p but to 
the number n of data too, especially for low SNRs (see figure [2] where MSE for GPOF 
estimates are plotted as a function of n for moderately large noise). An easy heuristic 
argument can be provided for the damped sinusoids case. If kr is such that for k > kr 
the signal is decayed under the noise threshold (e.g G [— 3cr, 3cr], k > k^-) more noise 
is injected in the estimation method hiding the signal and worsening the quality of the 
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estimates. If n is too large it can be expected that the model order p is overestimated 
by any model order selection criterium such as e.g. BIC. This makes it difficult to 
distinguish between modes related to signal and modes related to noise. But even when 
n is chosen correctly, if the SNRs are moderate or small, the bias of the estimates can 
be the main responsible of the inability to resolve modes with close frequency. In this 
case the model order p is hkely to be underestimated by BIC and all the estimates of 
the other parameters are likely to be meaningless. 

2. The proposed method 

2.1. Outline 

From the discussion of the previous section, in order to propose an automatic method 
which improves on the bias affecting exact and approximate MLE, we start from the 
complex exponential interpolation problem, which is likely to capture the best features 
of MLE and exploits the ensemble behavior (as specified below) of its solution which is 
easier to study than the ensemble behavior of MLE. Specifically the basic observation 
which motivates the proposed method is the following. When SNRs are moderate or 
low the performances of a good standard algorithm, such as e.g. GPOF, measured 
by the MSE of the parameters vary significantly as a function of the noise realization 
used. For example for some noise realizations, two modes with close frequencies can be 
well separated even if SNRs are low, while for other noise realizations, with the same 
variance, this is not true. This means that the bias of the frequency estimates in some 
cases makes the two modes even closer than they are making it impossible to separate 
them while in other cases the opposite is true. The idea is then to base the inference 
on the ensemble behavior instead than on a single realization. However usually we have 
just one single data set. Therefore we propose to use it first to get information on the 
statistical distribution over the ensemble of the Q,j = 1, . . . , n/2 which are the critical 
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quantities which the parameter estimates are based on, and then to make use of the data 
again to get point and interval estimates of the parameters by a stochastic perturbation 
method. To this purpose the original problem is reformulated as the one of estimating 
the complex measure 

where D (ZW is a compact set, from its noisy moments 

Ofe = fikA) + ek, k^O,...,n-l. 

It turns out that 

Sk^ z''S{z)dz = jj^x + iyfS{x + iy)dxdy, A; = 0, 1, 2, . . . 

where 

Sk = tcj^j-fik^) (4) 
hence this problem is equivalent to the original one. We notice that S{z) is an atomic 
measure supported on the (unknown) points ^j, j — 1, . . . ,p. Estimating a set Q such 
that & fl, j = 1, ... ,p, is our first goal. 

2.2. The first step 

The idea is to make use of the relation, discussed in Section 1, between the numbers 
^j, j — 1, . . . ,p and the r.v. Q,j — 1, . . . ,n/2 which solve the complex exponential 
interpolation problem for the data a^, k = 0, . . . ,n — 1. More specifically we want to 
study the location in of the Q. As these are r.v. we are looking for a probability 
function h{z) defined on the complex plane such that 

2 n/2 



[ h(z)dz^-Y,nCkeN}, New. 



k=l 

The main reason to consider the Q is now apparent: as Q are the generalized 
eigenvalues of the pencil Ui{a) — zUo{a), they are the roots of the polynomial Q{z) = 
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det{Ui{a) — -2f/o(a)). But then h{z) is the condensed density of these roots which is 
given by (e.g. [5J): 



where A denotes the Laplacian operator with respect to x,y if z = x + iy and 



is the corresponding logarithmic potential and E is the expectation operator w.r.to the 
density of the a^. In the limit for o" — it can be shown [3] that h{z) tends weakly to 
a measure supported on the points ^j, j = 1, . . . ,p. Therefore our first goal is reached 
if we are able to compute the expectation in ([5]) and to cope with the fact that h{z) 
conveys the information on the j = 1, . . . ,p only in the limit for cr — > 0. In [1] a 
closed form approximation to h{z) based on a single realization {a^, k = 0, . . . ,n — 1} 
is provided. The logarithmic potential log(|(5(-2)P is approximated by a sum of powers 
of variables whose expectation can be computed in closed form. We then get 



where A is the discrete Laplacian evaluated on a square lattice £, Rlk{z) is the diagonal 
of the R factor in the QR factorization of Ui — zUq and (3 is an hyperparameter, to be 
discussed in the following, which control the smoothness of h{z) hence helping in coping 
with the noise. In fact, because of the limit property of h{z), if cr is small enough 
there exist disjoint sets N^, k = centered on ^fc, k = 1, . . . ,p, such that 

J]sf h{z)dz ~ 1, = Ufc^fc- Moreover it was shown in O |3] that h{z) can have 
other noise-related local maxima which are located close to the unit circle. However if 
there exist signal-related local maxima close to the unit circle they can be distinguished 
from the noise-related ones not only by their relative higher magnitude but also by 
the fact that they are surrounded by a set where h{z) ^ (gap of poles of the Fade' 
approximants [211 [25]). Increasing /3 will depress the local maxima of h{z) and will 
make larger the sets Nj. because h{z) is a probability density. Eventually some sets A^^ 



.) = -Au{z) 




(5) 




(6) 
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will merge together therefore determining a loss of resolution but the local noise-related 
maxima will be depressed too and therefore can be easily detected and filtered out by a 
simple thresholding technique which can also make use of the "gap of poles" property. 
Furthermore only a fraction n = 2p < n, p ^ p of data are used in this step in order to 
make an implicit noise filtering. Of course we loose in resolution but this is not relevant 
in this step. Summing up, in the first step of the procedure the data are used to identify 
the sets Nk, k = 1, . . . ,pn < p such that C,j E N = Ujt Vj = 1, . . . ,p. In figlS] top 
left the results obtained at the end of the first step are shown on a specific example 
described in Section 4. Three not intersecting sets Nh are computed which contains in 
their union the true generalized eigenvalues C,k, k = 1, . . . ,5. 

2.3. The second step 

Our second goal is to get point and interval estimates of the parameters. To this 
purpose a method based on the stochastic perturbation idea proposed in |3j is used. 
Pseudosamples are generated from {ofc, /c = 0, . . . , n — 1} by 



where vj^ are i.i.d. zero mean complex Gaussian variables with variance a'^ independent 
of a/j, V/i. The complex exponentials interpolation problem (CEIP) is solved for each 
of them. GPOF method is used with all data and I = njl^ p = p. The generalized 
eigenvalues are pooled and those not belonging to are discarded. Then a standard 
clustering method such as e.g. K-means ^23j is applied to the generalized eigenvalues 
belonging to by fixing to p the number of cluster to be estimated and initial centroids 
given by the solution of the CEIP problem for the original data. The clusters whose 
cardinality is not close to T are discarded because it was proved in that for each 
pseudosample it can be expected that in a small neighbor of each C,k,k = 1, ... ,p, it will 
fall at least one estimated generalized eigenvalue. The number of selected clusters is an 




ak + , k = 0, . . . ,n - 1; 



r = 



1 T 

X, . . . , J. 
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estimate p of p. In figE] top right and bottom left and right the big dots indicates the 
generahzed eigenvalues which belong to on a specific case and small dots indicates 
the generalized eigenvalues which do not belong to A^. We notice the presence of several 
spurious clusters of generalized eigenvalues which justify the importance of the first step 
of the procedure. The estimates C,k of C,k are then computed by averaging the generalized 
eigenvalues belonging to the A;— th cluster. The estimates Ck of are then computed by 
solving the standard least squares problem 



We notice that interval estimates of ^ and c can also be obtained from the clustering 
results. Remembering that we have noticed that approximate MLE can depend on the 
number of data n, the procedure outlined above can be improved by repeating the two 
steps for several values of n and choose the best one by using an order selection criterium 
such as e.g. BIG. 

2.4- Estimation of (3 

The first step of the procedure depends critically on the choice of (3. A value of (3 too 
small will give rise to many modes of h{z) which are likely to be spurious but not easily 
detectable as noise-related ones. A value of (3 too large will give rise to a small number 
of modes, possibly much less than p. The clustering method can then become critical. 
The idea for getting a good value for (3 is based on a comparison of formula ([6]) with 
another approximation of h{z) given in [3j by: 



c = argmin^ll V(|^)7 — a 



h{z) 



1 



A log(/i,(^)) 



2?™ 



where [Jij{z) are the eigenvalues of 
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where A{z,^) g (F"/^^'^/^ is a tridiagonal hermitian matrix with 1 + \z\'^ on the leading 
diagonal and — z and —z on the diagonals respectively below and above the leading one. 
As the numbers s given in (jlj) are unknown, this formula cannot be used to estimate 
h{z). However the following Proposition holds 

Proposition 6 If P = | then Vz, u{z,a) — u{z,a) = o(cr), as a 0, where 
u{z,a) and u{z,a) are the logarithmic potentials respectively ofh{z,a) and h{z,a) 

Proof. We notice that h{z; a)) = logdet(f/f/^ + f a^A) where U = Ui{s) - zUois). 
Let he U = QR the QR decomposition of U. As U = U'^ we also have U = RFQ^ 
and therefore UU^ = R^Q^QR = R^R because Q is unitary. But then h{z;a)) = 
^^A\ogdet{R^ R + ^a'^A). Because of the structure of A it is easy to show that 
det{A{z)) = > 1 as det{A{z)) is an increasing function of |z| and det(y4(0)) = 1. 

But then, from the general identity det{Z + XY^) = det(Z) det{I + Z-^X) it follows 
that 

det + ma^Z) = det (ma^A) det (/ + R{ma'^A)-^R") 

> det (ma^l) det (/ + Rima"^ ly^ R") = det + ma^l) 

= YlXk {RR" + ma'^l) = n [h{RR") + ma'^] = H [^k{R)^ + ma"^ 

where m = n/2, Xk{X) denotes the fc— th eigenvalue of X and ak{X) denotes its k—th. 
singular value. From [26l D.l,pg.228] we know that 

l-Rii, ■ ■ ■ ,R mm 

and, from [26l A.2,pg.ll6] it follows that Va > 

+ a, . . . , Rl,^ + a] Wi{Rf + a, . . . , am{Rf + «]• 

But then from A.2.c,pg.ll7] we have 

n [yk{Rf + ma^] > n [RI + ma'] ^ Y[ [e[RI] + ma 
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where E is the expectation operator, and 
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a)=Y: log UiRf + ma'] > ^ log \e[RI,] + 



ma 



> E 



log l^lk + 



L k 



u{z, a) 



k k 

the last inequality follows by a generalization of Jensen inequality to matrix functions 
defined on the set of Hermitian matrices [26l F.2.c,pg.476, E.6,pg.467]. 
We notice now that j4j, eq.6] 

"1 fE[Rl 



h{z; a, (3) 



2im 
1 

2?™ 
1 

2?™ 



n/2 / 

k=l \ 
n/2 / 

AE log 

k=l \ 
n/2 



^\k\ 



1 fE[R' 



kki 



2 V (t'P 
A^log E[Rl]+a'P 



1 



+ 1 



k=l 



hence, by choosing P = m, 



^log \E[Rl] + 



ma 



The thesis follows because u{z, a) — u{z, a) = o{a) as proved in \3, Th.3]. □ 
The Proposition above suggests to the initial guess for (3 and then to increase 

it by a little amount to get a smoother estimate of h{z) useful for estimating the set Q 
in the first step. In the following the value /3 = 0.6n is used. 



2.5. Filtering the QR decomposition 

It turns out that the first step of the procedure depends critically on the QR factorization 
of the matrix Ui — zUq or, as proved in [1], on that of the matrix U defined in ([3]). It 
is therefore necessary to filter out the noise from the R factor of U. This is a very 
delicate task which can however successfully accomplished by taking into account the 
special structure of the data as follows. We notice that the real and imaginary parts 
of the signal f{t) = J2^=iCjCj decay to zero exponentially. However when Gaussian 
noise is present the tail of the data fill a rectangular region centered on the of 
width ^ 2^/2a. A classic way to reduce the contribution of the noise consists therefore 
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in applying an exponential filter to force the tail of the data to go to zero as in the 
noiseless case. In section 2.4 we discussed the Cadzow iteration to filter out the noise 
in U without destroying its Hankel structure. However, in order to further improve the 
estimate of R, we suggest to apply a filter also after the factorization process. 

To this aim we notice first that, if f/ = QR, Q^Q = I, R upper trapezoidal, 
the main diagonal of R can be chosen to be non-negative and monotonic decreasing. 
In the noiseless case the last n — p rows of R must be zero, as rank{U) = p. It can 
be shown experimentally that the same behavior characterizes also the absolute value 
of the secondary diagonals {\Rh^h+i\, h = 1, . . . ,p — 1} , I = 0, ... ,p — 1. Moreover 
this behavior is preserved also in the noisy case but with an asymptotic value greater 
than zero. In figj6] the results of a simulation showing these facts are reported. A set 
of complex exponential signals were generated with random frequencies uj and phases 
6j with uniform distribution in [— 7r,7r), random decays pj with uniform distribution 
in (0, 1] and complex standard Gaussian random amplitudes normalized in order to 
make their absolute values to sum to one. The matrix U was then formed and the 
QR decomposition was computed. The absolute values of the diagonals of R were then 
averaged and the results for the main diagonal and the first three secondary diagonals 
was plotted. The same is done by adding complex Gaussian white noise to the complex 
exponential signals. 

The comparison of the results in the noiseless and noisy cases for several SNRs and 
orders p, suggests that we can filter out the noise in the diagonals of R by 

R 

Rh,h+i = /i = 7; > 0, / = 0, 

In fig(l6]) the filtered diagonals were plotted too where 7 was estimated by solving the 
problems 

7i = argmin^ ^ \Rh,h+i - Rh,h+i\, / = 0, . . . ,p - 1. 

h=l 

It can be noticed a good agreement between the noiseless and filtered data, therefore 
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the functional form of the filter seems to be adequate to do the job. In the following we 

choose only one hyperparameter 7 and filter the diagonals of R according to the rule 

~ _ Rh,h+i .-X 
^h,h+i - {() 

2.6. The algorithm 

Summing up, a sketch of the proposed algorithm is the following: 

• fix an overestimate p oi the true number of components p 

• compute U based on the first n = 2p data and filter it by Cadzow algorithm 

• compute U = QR and filter the diagonals of R by formula ([7]) 

• compute the Hessemberg matrices R{Ei — zEq), Wz & C and reduce them to 
triangular form by Givens rotations 

• compute h{z] (3), (3 = 0.6n, by formula ([6]) where Rkk{z) are the diagonal elements 
of the triangular matrices computed in the previous step 

• compute the sets A^^ such that 

— h{z; P) is unimodal for z & 

- nLiA^fc = 

by selecting the local maxima of h{z; (3) above a given threshold r > 0, and then 
by identifying the neighbor of the k-th local maxima such that (3) is 
monotonic decreasing along the four coordinate directions on the lattice £ starting 
from 4 

• generate T pseudosamples based on the original n data 

• solve the CEIP for each pseudosample by GPOF method with parameters I = n/2,p 
and pool the ^^'^^ 

• cluster the ^^'^^ € U ^fc and discard the others. The k-means method is used with 
p as the number of clusters, and the clusters with less than [aTj, a G (0.5,1] 
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elements are discarded 

• Pott = number of selected clusters 

• = average of the ^l^^ in cluster k-th, k = 1, . . . ,Pott 

• Cfc = average of the cf^^ in cluster fc-th, k = 1, . . . ,pott 

3. Simulation results 

In order to test the advantages of the proposed method w.r.to the standard ones, 
four experiments were performed corresponding to the four values of the noise s.d. 
o = 2-\/2, v^, In each experiment = 100 independent realizations of the r.v. 

a^^\ k = 1, . . . , n = 120, h = 1, . . . ,N were generated from the complex exponentials 
model with p = 5 components given by 

-0.3-i27r0.35 -0.1-j27r0.3 -0.05-i27r0.28 -0.0001+i27r0.2 -0.0001+i27r0.2l" 

c= [20,6,3,1,1] 

by adding complex Gaussian noise with s.d. a. We notice that the frequencies of the 4^*^ 
and 5*^ components are closer than the Nyquist frequency if n < 1/(0.21 — 0.20) = 100. 
By defining SNRi = we label the four considered cases by SNR = miuj SNRi = 

[0.5, 1, 3, 10]. For each experiment and for each h = 1, . . . , N the method GPOF [18] was 
applied with / = m/2, m = n/3, . . . ,n and p = 1/3, . . . , 1/2. For each estimate ((m,p) 
of the generalized eigenvalues, the corresponding estimates '-f{m,p) of the weights was 
obtained by solving a linear least squares problem. The optimal values {mott,Pott) of 
{m,p) were chosen by minimizing the BIG criterium [2]. The corresponding optimal 
parameters and Cj were then computed. \cj\ were then sorted in descending order 
and were sorted accordingly, cj and were then used to estimate the signal by 

Pott 
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If Pott > P, the relative error was computed by 

VP \r- - -12 
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E{a, h) 



Otherwise E{(T,h) was set to the conventional value —1. In fig. [7| the empirical 
distributions over the N replications of Potticr, ■) and E{a,-) were reported for each 
value of cr. It can be noted that the optimal model order is reasonably concentrated 
around the true value p = 5 only for the two smallest value of cr. Consequently also the 
relative error is reasonably small only in those cases. The average relative MSEs 

1 iV<T 



MSE{a) 



h=l 



where is the cardinality of the set {h\E{a, h) > 0}, are reported in the first row of 
Table [H In the second row the cardinalities N^r are reported. 





SNR = 0.5 


SNR = 1 


SNR=3 


SNR=10 


MSE{standard) 


1.4 


1.2 


0.3 


0.1 




18 


14 


68 


100 


MSE{proposed) 


1.2 


0.9 


0.4 


0.1 




62 


83 


95 


90 



Table 1. Standard method: relative MSEs (first row) averaged over Ncr (second row) 
replications. Proposed method: relative MSEs (third row) averaged over Nc^ (fourth 
row) replications. 



The new method was then applied to the same data. The algorithm illustrated in 
section 2.6 for cr = 2y/2, ^ and h = 1, . . . ,N was applied. However in this 

case only the first n = 80 data values were used, hence a super-resolution problem 
is involved in this case. After some trials and errors the following hyperparameters 
provide the best results: lattice dimension = 80, number of iterations of the Cadzow 
algorithm = 10, filter parameter 7 = 0.4, threshold for selecting the local maxima of 
the condensed density r = 2.e — 3, number of clusters for k-means p = 20, number of 
pseudosamples T = 20, ratio between standard deviation of pseudosamples and noise 
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standard deviation ^ = 0.15, acceptation threshold for clusters a = 0.75. Results are 
reported in fig. [HI However we remark that the hyperparameters values are not critical 
in the sense that the same qualitative results are obtained by slightly perturbing these 
values. 

The average relative MSEs and the corresponding cardinalities are reported in 
the third and fourth rows of Table [H 

The following advantages over the standard method can be noticed: 

• only 2/3 of data are needed to get the best results 

• the right order p is better estimated for low SNRs 

• the MSE is comparable for large SNRs and better for low SNRs 

We remark that the results reported in figlH] and Table [H were obtained by using the 
same hyperparameters reported above for all SNRs considered. A fine tuning of the 
hyperparameters could possibly further improve the results. Finally we notice that the 
standard method based on GPOF requires about six times more fioating point operations 
than the proposed method in the case considered. This is due to the fact that 0(n^/6) 
svd factorizations are required in the standard method to select the good order, while 
only T svd factorizations are needed in the proposed method. 

4. Conclusions 

A classic approximation problem which is at the core of many ill posed inverse problems 
arising in many application fields is revisited and a new stochastic approach is considered 
to overcome the drawbacks of standard methods. It turns out that some tools developed 
in the framework of the theory of random matrices, such as the condensed density of the 
(generalized) eigenvalues, provides a deep insight on the structure of the approximation 
problem. Coupling this information with a stochastic perturbation approach, the bias 
which affects standard estimators based on Maximum Likelihood can be controlled and 
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a solution with better statistical properties, than those provided by standard methods, 
can be computed. The price to pay is that a few hyperparameters must be identified 
in order to get an automatic estimation procedure. However the flexibility provided by 
the hyperparameters allows to extract the correct information in low SNR situations. 
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Figure 1. 

MSE as a function of the number of data. MSE of pml = —0.9 of the example in 
Section 1 computed by MonteCarlo simulation (10000 samples, a — 0.1) and numerical 
minimization (solid), MSE computed by numerical integration using the approximated 
density (dotted), MSE computed by MonteCarlo simulation and GPOF method with 
filter order 1 (dashed). 
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Figure 2. 

MSE as a function of the number of data. MSE of pml = —0.9 of the example in 
Section 1 computed by MonteCarlo simulation (10000 samples, a — 0.3) and numerical 
minimization (solid), MSE computed by numerical integration using the approximated 
density (dotted), MSE computed by MonteCarlo simulation and GPOF method with 
filter order 1 (dashed). 
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Figure 3. 

MSE of pml of the example in Section 1 as a function of the true value of the parameter 
and of the noise standard deviation, computed by numerical integration using the 
approximated density. 
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Figure 4. 

Density Poo (2;) of pml of the example in Section 1 (dotted). Empirical density computed 
by MonteCarlo simulation (10000 samples, a — 100, p = — 0.8,n — 500) and numerical 
minimization with uniformly distributed initial value in [—1, 1] (solid). Empirical density 
computed by MonteCarlo simulation and GPOF method with filter order 1 (dashed). 
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Figure 5. Top left: the sets Nj, j = 1,. . . ,pN, Pn = 3, SNR = 0.5; top right and 
bottom left and right: zoom of the sets Ni,N2,N3; the small dots are the generalized 
eigenvalues corresponding to each pseudosample; the big dots are the generalized 
eigenvalues falling in NiL) N2U N^; the "x" are the initial centroids of the clustering 

procedure; the "+" arc the estimated centroids; the "o" are the centroids of clusters 
with more than 0.75 • R points where R = 100 is the number of psudosamples. 
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Figure 6. 

Top right: the main diagonal of the matrix it! in the noiseless case (dotted), in the noisy 
case (dashed) and the filtered one (solid) are represented when SNR= 1, p — 5. Top 
left: the same for the absolute value of the first diagonal. Bottom left: the same for the 
absolute value of the second diagonal. Bottom right: the same for the absolute value of 
the third diagonal. 
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GPOF: distribution of tlie optimal model order in 100 samples 
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GPOF: distribution of the relative error w.r.t. the true parameters in 100 samples 



-1 



-0.5 



_n_L. 



0.5 



.1 ii. 



1.5 



SNR-0.5 
SNR=1 
SNR=3 
SNR=10 



2.5 



Figure 7. The standard method: the empirical distributions over the N repUcations 
of Po«(o-, •) (top) and E{a, •) (bottom) for a = 2^2, V2, ^, ^. 
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proposed method: distribution of tlie optimal model order in 100 samples 
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proposed method: distribution of the relative error w.r.t. the true parameters in 100 samples 
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Figure 8. The proposed method: the empirical distributions over the N repUcations 
of Po«(o-, •) (top) and E{a, •) (bottom) for a = 2^2, V2, ^, ^. 



